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ABSTRACT 

[ We present ZEBRA, the Zurich Extragalactic Bayesian Redshift Analyzer. 

The current version of ZEBRA combines and extends several of the classical approaches 
to produce accurate photometric redshifts down to faint magnitudes. In particu- 
lar, ZEBRA uses the template-fitting approach to produce Maximum Likelihood and 
Bayesian redshift estimates based on: 

(1.) An automatic iterative technique to correct the original set of galaxy templates 
to best represent the SEDs of real galaxies at different redshifts; 
(2.) A training set of spectroscopic redshifts for a small fraction of the photometric 
sample to improve the robustness of the photometric redshift estimates; and 
(3.) An iterative technique for Bayesian redshift estimates, which extracts the full 
two-dimensional redshift and template probability function for each galaxy. 
We demonstrate the performance of ZEBRA by applying it to a sample of 866 
Iab ^ 22.5 COSMOS galaxies with available u*, B, V, g' , r' , i', z' and Kg pho- 
tometry and zCOSMOS spectroscopic redshifts in the range < z < 1.3. Adopting a 
5-(T-clipping that excludes ^ 10 galaxies, both the Maximum Likelihood and Bayesian 
ZEBRA estimates for this sample have an accuracy c/^z/ii-i-z) smaller than 0.03. Similar 

accuracies are recovered using mock gala xies. 

ZEBRA is made available to the public at: http://www. exp-astro.phys. ethz. ch/ZEBRA 
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1 INTRODUCTION 

Current imaging s urveys of faint high r edshift galaxies such 
as, e.g., COSMOS JScoville et alj|2006l) . aheady return mil- 
hons of galaxies with magnitudes well beyond the current ob- 
servational spectroscopic limits. As spectroscopic redshifts 
for such large distant galaxy samples will thus remain prac- 
tically unobtainable in the foreseeable future, photometric 
redshifts of increasing accuracy will have to be constructed 
in order to properly exploit the wealth of information, as 
a function of cosmic epoch, that is potentially extractable 
from state-of-the-art and future large imaging surveys. 

The importance of estimating accurate redshifts from 
multi-wavelength medium- and broad-band photometry for 
large galaxy samples is reflected in the extensive efforts that 
have been devoted to improving algorithms and methodolo- 
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references therein). These works are based on a few basic 
principles, namely: 

(i) minimization of the difference between a model- 
galaxy Spectral Energy Distribution (SED; the model SEDs 
are hereafter referred to as templates) and the observed 
galaxy photometry; {ii) Neural network approaches that rely 
on the availability of a small sample of spectroscopic red- 
shifts to find a functional dependence between photometric 
data and redshifts; [Hi) Hybrid approaches that perform 
standard minimization while using a small spectroscopic 
training sample to optimize the initial set of galaxy tem- 
plates; {iv) Bayesian methods which use additional informa- 
tion provided by a prior to obtain final photometric redshift 
estimates. 

Motivated by the scientific returns of deriving accurate 
photometric redshifts for large numbers of faint COSMOS 
galaxies, we have developed ZEBRA, the Zurich Extragalac- 
tic Bayesian Redshift Analyzer. In this paper we describe 
the current version of ZEBRA, which combines and extends 
several of the above-mentioned approaches to produce accu- 
rate photometric redshifts down to faint magnitudes. More 
specifically, the paper is structured as follows: 

Section|21 About ZEBRA, specifies the input requirements 
and the output of ZEBRA. 

Sectional The principles of ZEBRA, describes the general 
design and methodological details of the code. A fiow chart 
indicating the architectural structure of ZEBRA is shown in 
Figure Basically, ZEBRA produces two separate estimates 
for the photometric redshifts of individual galaxies: A Max- 
imum Likelihood (ML) estimate, and a Bayesian (BY) esti- 
mate. These achieve a high accuracy by combining together 
some novel features with several of the approaches that have 
been published in the literature. In particular, ZEBRA: 

- Uses a novel automatic iterative technique to correct 
an original set of galaxy templates to best represent the 
SEDs of real galaxies at different redshifts. These template 
corrections depend on the accuracies and systematic errors 
in the absolute photometric calibrations; therefore, prior to 
performing the individual template corrections, ZEBRA au- 
tomatically removes systematic calibration errors in the in- 
put photometric catalogs. The template corrections substan- 



tially reduce the photometric redshift inaccuracies that are 
generated by galaxy-template mismatches; 

- Can be fed with a training set of spectroscopic redshifts 
for a small fraction of the photometric sample, to improve 
the robustness of the photometric redshift estimates; 

- Adopts an iterative technique for Bayesian photomet- 
ric redshift estimates that extracts the full two-dimensional 
redshift and template likelihood function for each galaxy. 

Section |3] The 1"* application of ZEBRA, demonstrates 
the performance of ZEBRA by comparing our photometric 
redshifts estimates for a sample of 866 Ihst,ab < 22.5 ACS- 
selected COSMOS galaxies with high-quality zCO SMOS 
spectroscopic redshifts Zapec ^ 1.3 jLillv et aljboOeTl . Based 
on the currently available passbands and photometric accu- 
racies, both the ML and BY ZEBRA photo-z's for COSMOS 
galaxies have a 5(t— clipped accuracy of Az/(1 + z) — 0.027 
over the entire redshift range (with ~ 1% outliers). 

Section|S] Concluding remarks, briefly comments on the 
first applications of the COSMOS ZEBRA photometric red- 
shifts, and lists the developments which we are already im- 
plementing in the next version of ZEBRA. 

The three Appendices introduce the notation and con- 
ventions that we use throughout the paper (Appendix IXt. 
present in detail the explicit mathematical formulation of 
zebra's algorithms (Appendix and demonstrate the 
ZEBRA performance on a Mock catalogue produced for the 
COSMOS survey (courtesy of Manfred Kitzbichler; Ap- 
pendix inj. 

ZEBRA is made available to the general public at 
:/ /www. exp-astro.phys. ethz. ch/ZEBRA ^ . 
The use of ZEBRA should please be acknowledged with 
an explicit reference to this paper in the bibliographic list 
of any resulting publication. 



2 ABOUT ZEBRA 

ZEBRA accepts as input: 

(i) A photometric catalog containing medium- and 
broad-band photometric data for each galaxy of the sam- 
ple under study; 

(ii) The filter transmission curves corresponding to the 
passbands of the photometric catalogue, and 

{Hi) An initial set of templates. 

Optimally, photometric errors should also be included in 
the photometric catalog; however, it is possible to set errors 
to a user-specified value. Some frequently used templates 
and filter curves are already provided within ZEBRA. 

ZEBRA offers a variety of output information depending 
on the program configuration: 

— When run in photometry- check mode (Section 13. It . 
the program corrects the input photometric catalog of any 
systematic calibration error, and returns the detailed infor- 
mation about the applied corrections. 

— In template-optimization mode (Section l3.2t ZEBRA re- 
turns the corrected templates as wavelength versus spectral- 
fiux-density tables. 

— In the Maximum- Likelihood mode (section 13.31 , the 
main output consists of the best fit redshift and template 

^ The ZEBRA website is currently under construction. 
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Figure 1. The architectural design of ZEBRA: the individual com- 
ponents arc described in Section |21 The calibration of the photo- 
metric catalog and the automatic template corrections are per- 
formed by running ZEBRA in its photometry- check and template- 
optimization modes, respectively. In red are shown the boxes 
corresponding to the innovative components of ZEBRA; In par- 
ticular the automatic template correction module, and the two- 
dimensional Bayesian module. The output of ZEBRA is indicated 
by blue boxes. 



type for each galaxy in the photometric catalogue, together 
with their confidence limits estimated from constant 
boundaries. Additionally, the program returns: (i) the min- 
imum (ii) the normalization of the best fit template, 
(iii) the rest-frame B-band magnitude and (iv) the luminos- 
ity distance. If specified by the user, further information is 
accessible, e.g. the likelihood functions for all galaxies in sev- 
eral output formats, and the residuals between best fit tem- 
plate magnitude and observed magnitude for each galaxy in 
each passband. 

— In the Bayesian mode (Section l3.4ll . ZEBRA calculates 
the 2D-prior in redshift and template space in an iterative 
fashion. This final prior (and, if specified, the interim prior of 
each iteration step) is provided, together with the posterior 
for each galaxy. The posterior can be saved as full 2D-table 
or in marginalized form. ZEBRA's output also lists the most 
probable redshift and template type for each galaxy, as de- 
fined by the maximum of (i) the 2D posterior or (ii) the 
posterior after marginalizing over templates types and red- 
shifts, respectively. The errors are calculated directly from 
the posterior. 

— ZEBRA can also derive and return /('-corrections based 
on the specified templates and filters. 

AU input and output files of ZEBRA are ASCII-files. 

Overall, ZEBRA is designed in a flexible way allowing 
all key-parameters to be user-defined. A detailed updated 
description of ZEBRA's input and output, and a manual ex- 
plaining its use, can be found at ZEBRA's URL. 



3 THE PRINCIPLES OF ZEBRA 

3.1 Step 0: Correction of systematic calibration 
errors in the input photometric catalogues 

In principle, with perfectly calibrated photometry, ZEBRA 
can be run directly in the template-optimization mode, so 
as to determine the optimal corrections to the original tem- 
plates that allow to properly reproduce the SED of galaxies 
at all redshifts. If present, however, systematic calibration 
errors in the input photometric catalogs deteriorate the qual- 
ity of the photometric redshift estimates. Such calibration 
errors can be easily identified, as they lead to residuals which 
are independent of template type between best-fit template 
and galaxy fluxes. 

The photometry-check mode of ZEBRA offers the possibil- 
ity of correcting for any such possible systematic calibration 
error before performing any correction to t he shape of the 
individual templat es (as do ne, e.g., in iBeni'tcz (2000) and 
ICaoak et all ; see also lCoe et al.l ll2006il) for an appli- 

cation). In particular, the photometry-check mode of ZEBRA: 

— Computes, for each galaxy i and for each filter n, the 
difference Amag,j ^ between the magnitude of the best-flt 
template and the observed magnitude mag„ ^ of the galaxy 
in that fllter.^ 

— Fits, separately for each passband but independent 
of the template, the dependence of the Amag residuals on 
the observed galaxy magnitude. A constant shift, a linear or 
higher order regression can be separately applied to each of 
the Amag vs mag relations. 

— Applies the derived corrections to each photometric 
set of data before re-iterating the procedure. The photo- 
metric corrections clearly depend on the input templates. 
Hence, it is important to ensure that the initial set of tem- 
plates is well adapted to the galaxy types in the catalog and 
adequately covers the wavelength range which encompasses 
all passbands at all relevant redshifts. Furthermore, a photo- 
metric shift in one passband may lead to a change in the nor- 
malization of the template fits; Thus, a faster convergence 
of the iterative procedure can be reached by temporarily 
increasing the relative error in the specific passband. Tests 
performed by adding artifical offsets to our photometric data 
(observed and mock, see Section [4. II and Appendix|nj show 
that, with this extra step, convergence is always achieved as 
long as (z) not all bands need significant photometric correc- 
tions (i.e. much larger than the photometric error), and (ii) 
the passbands are not strongly correlated, e.g. they should 
not overlap. In Appendix|n|we further discuss these issues. 

The main modules of ZEBRA are then run on input pho- 
tometric catalogues that contain no systematic errors in the 
calibrations. 



^ These residuals can be calculated using either the full photo- 
metric catalogue, or the small "training set" of galaxies with spec- 
troscopic redshifts, if available. The latter approach has the ad- 
vantage that the known redshift can be kept fixed in the template- 
galaxy fits, thereby reducing the scatter in the detected trends. 
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3.2 A key element of ZEBRA: A novel automatic 
template correction scheme 

In principle, an advantage of template-matciiing approaclies 
for piioto-z estimates is tliat tliey do not necessarily require a 
training set of galaxies with accurately known redshift from 
spectroscopic measurements. In practice, however, the avail- 
able templates (e.g., 2 = galaxy SEDs or synthetic mod- 
els) are typically inadequate to reproduce the SEDs of real 
galaxies at all redshifts. Therefore, a substantial error in the 
estimate of the photo-z's in template-matching schemes is 
contributed by mismatches between real galaxies and avail- 
able tem£lates^_^ 

iBudavari et alJ i2000t) propose, as a way to mitigate 
this problem, to apply the training set approach within the 
template-fitting method so as to optimize for the shape 
of the spectral template that best match the predicted 
galaxy colors (calculated using the spectroscopic redshift) 
and the observed colors. This is done by transforming the 
discrete template space into a linear continuous space, and 
using a Karhunen-Loeve expansion to iteratively correct, 
through a minimization scheme, the eigenbases of a lower- 
dimensional subspace. As a result, spectral templates are 
derived that are a better match to the SEDs of the galax- 
ies in the training set than are the initial model/empirical 
templates (se e also ICsaba i et al]| 200(llBudavari et alJl200lL 
lBenftezll20M: ICsabai et al.ll2003l present an application of 
this method to SDSS data). 

Given the availability of a training set of galaxies in 
the redshift interval of interest, ZEBRA uses a similar tem- 
plate correction scheme, which however extends and im- 
proves on the x^-minimization approach adopted in the pre- 
vious works. The improvements include: 

(i) The simultaneous application of the minimization 
scheme to all galaxies in the photometric sample at once; 

(ii) The introduction of a regularization term in the 
expression, which prevents unphysical, oscillatory wiggles in 
the wavelength-dependent template correction functions; 

(iii) A formalization of the minimization step that al- 
lows the use of interpolated templates (in magnitude space, 
so as to better sample the parameter space covered by the 
available original templates), and includes the effects of in- 
tergalactic absorption in a straightforward manner; 

(iv) Template corrections optimized in different user- 
specified redshift regimes. 

Details on the implementation of the concepts above 
are given in Appendix IH] Briefiy, ZEBRA minimizes, for all 
catalogue entries i with best fitted template type t at once, 
the following expression: 

1 1 

, J_ rcor _ robs -,2 

+ TT 2-^ A. 2 \Jn,i ~ Jn.i I 

' i=l n=l 

+ E ^(^r(fc + 1) - sr{k) - sf^k + 1) + sf^{k)f, 

(1) 

with the following definitions: 

— Nt is the set of catalogue entries, and contains all 
galaxies which are best fitted by template type t. 



— at.k is a pliantness parameter that regulates the am- 
plitude of the deviations of the corrected template shape 
from the initial template shape. 

— S(°''(fc) is the corrected template shape for template 
type t, and is obviously a function of the wavelength k. 

— s°"^(fc) is the shape of the original template t. 

— An,i is the error of the photometric flux density in 
filter n for galaxy i. 

— f^°i is the spectral flux density of the corrected best fit 
template t in filter band n for galaxy i. The dependence on 
the best fit template type t, best flt redshift z and template 
normalization a is left implicit. 

— f°^i is the observed spectral flux density of galaxy i 
in filter band n. 

— pt^k is the regularization parameter, which con- 
strains the gradients between original and corrected tem- 
plate shapes. The smaller p, the stronger the suppression 
of high-frequency oscillations in the shape of the corrected 
templates. 

The second term in the r.h.s. of equation Q minimizes 
the difference between observed flux f°^i and template flux 
fn,i for all passbands n, averaged over all galaxies i. With 
the appropriate choice for the pliantness and regularization 
parameters, the first and third terms ensure that the cor- 
rection procedure generates only templates with physically 
acceptable shapes. Specifically, the first term prevents too 
large deviations between the corrected and the uncorrected 
templates, and the last term regularizes the shape and in- 
hibits strong oscillations in the SED of the corrected tem- 
plates. Therefore, the minimization of the so-defined Xt is a 
compromise between two orthogonal requirements: On the 
one hand, each original template is changed so that, aver- 
aging over all galaxies i which are best fitted by that given 
original template, the corrected spectral flux density closely 
matches the measured spectral flux density. On the other 
hand, unphysical, large oscillations over small wavelength 
ranges are avoided when correcting the shape of the tem- 
plates. The self-regulation terms maximize the stability and 
reliability of the template corrections , especially when only 
a small training and/or modest S/N data are available. 

In principle, the optimal values of a and p might be 
both template and wavelength dependent. In Figure |21 we 
show the effects of varying a and p in the template correc- 
tion procedure. In particular, a too small value for a in- 
hibits template changes and thus reduces the efficacy of the 
corrections, and a too large value for a leads to unphysi- 
cal high-frequency oscillations in the shape of the corrected 
templates. The latter effect can be avoided by choosing an 
appropriate value for p. 

The ZEBRA template correction is implemented in two 
main steps. The procedure is started by using in step 1 only 
the original templates, but is iterated so that each new itera- 
tion of step 1 uses the combined set of original and corrected 
templates. The two main steps are: 

(1.) Computation of the set Nt that contains all galaxies 
which are best fitted by the template t or (from second iter- 
ation on) by a corrected template originating from template 
t. 

(2.) Correction of the template shape of each original 
template t by minimizing the corresponding xt expression. 
The two steps are repeated several times, as the best fit 
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Figure 2. The figure shows the efi^ect of varying the pUantness a 
and the regularization parameter p on the shape of the elliptical 
template. Top: The original template of an elliptical galaxy (solid 
black line) is compared with the corrected template using the val- 
ues cr = 2 and p = 0.05 (dotted black line). Middle and Bottom: 
The results of using different parameter choices are shown in de- 
tail. The solid and dotted line correspond to the same templates 
as above. The dashed-green (cr = 0.4) and dot-dashed red (tr = 2) 
lines result from applying the template correction scheme with- 
out regularization, i.e. setting p = oo. The unregularized template 
with cr = 0.4 is well-behaved, but this low value of tr is still in- 
adequate to properly fit the SEDs of observed galaxies. However, 
when choosing a five times higher pliantness (cr = 2) to try to 
improve the correction, strong unphysical oscillations develop in 
wavelength ranges that are smaller than the width of the filters. 
The template changes are localized in separated wavelength re- 
gions and lead to unrealistic, distinguished bumps in the template 
shape. The high-frequency oscillations may even require to set the 
flux of the corrected template to zero, in order to avoid negative 
spectral flux densities. These unphysical "over-corrections" are 
avoided by choosing a finite regularization parameter p. 



template type might change when considering new corrected 
templates in step 1 in the computation of Nt. 

We note that ZEBRA can perform logarithmic interpola- 
tions of the original (and corrected) templates; thus, the 
that is actually minimized in the code is modified relative 
to the expression above so as to take this into account (see 
Appendix B for details). 

Finally, ZEBRA can optimize the automatic template cor- 
rections in different redshift ranges by grouping the catalog 
entries in different redshift bins before the minimization 
step. This option, tested on the COSMOS data (Section 4), 
is found to substantially improve the reliability and quality 
of the ZEBRA template corrections. In Appendix |C| we test 
the method further by applying it to a mock catalog for the 
COSMOS survey. 



3.3 The ZEBRA Maximum Likelihood module 

The estimation of ML redshifts constitutes the core of the 
ZEBRA code. To produce the ML redshifts, ZEBRA performs 
the following steps (see Figure 0: 

(i) Read the input photometric catalog, filters and tem- 
plates. The data in the catalogs (expected to be in magni- 
tudes) are converted into spectral fiux densities. Data errors 
are either read from the catalog or specified by the user in 
one of several formats. 

(ii) Interpolate the original templates (optional) . Two in- 
terpolation schemes are implemented, namely interpolation 
in magnitude space ( "log-interpolation" ) and in spectral flux 
density ("lin-interpolation"); these can also be used simulta- 
neously. Specifically, a set of original templates is first sam- 
pled on a fixed wavelength grid, and then used to define the 
log-interpolated templates Sj"^^^ ^(fc) with the weight g rel- 
ative to the two basic adjacent templates St^ (k) and st^ (k) 
defined as: 

s\:%Ak) = {s,,{k)f-^s,,{k)r, ffG(o,i). (2) 

The lin-interpolated templates St\"t,,g(fc) are instead linear 
combinations of the basic templates: 

= (l-ff)sti(fc)+3St,(fc), 56(0,1). (3) 

(iii) Sample the filter curves on two different grids. The 
first grid is equal to the one used for the templates, and 
coarsely samples the filter shapes (as most of its elements 
are in wavelength bins where the transmission of the filters 
are equal to zero) ; this grid is used to optimize the speed of 
ZEBRA within the template correction scheme. The second 
grid is optimized to sample with high accuracy each indi- 
vidual filter in its transmission window; this high-resolution 
grid is used to calculate the spectral flux densities for each 
template in the different filter bands. 

(iv) Correct the filter transmission functions for sharp 
features occurring at particular wavelengths by smoothing 
with a top-hat kernel. These modifications to the original 
filter curves are found to prevent artificial peaks in the like- 
lihood functions; these peaks arise when e.g., a strong emis- 
sion line in the galaxy or template spectrum is "trapped" in 
a filter-transmission "hole", returning an overall spuriously 
small value. 

(v) Calculate the (redshift-optimized) corrections for 
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each original templates as described in Section f3. 21 and ex- 
tend the set of available templates to include both the orig- 
inal and the corrected templates, and their interpolations. 

(vi) Calculate, for each template t and redshift z, the 
spectral flux densities /z,t,n in each filter band n. The mean 
optical depth r(A, z) of t he intergalactic ab sorption i s com - 
puted according to either iMadaul |l99i) or lMeiksinI i2006l') 
(see Appendix IXt . The fz,t,n values are stored in a three- 
dimensional array. 

(vii) Determine the best fitting template normalization 
factor a* (z, t] and the value of ^{z,t,a*) using the formu- 
lation of iBen itcz (2000) (see Equation 1A4II . Appendix IXt . 
A search in the two-dimensional array 'x^{z,t,a*) is carried 
out to find the minimum and thus the best fitting values 
z* and t* . A pair (z,t) is accepted only if the absolute B 
magnitude Mb lies within some (user-supplied) limits, so as 
to avoid mathematically good fits which are however physi- 
cally unacceptable (as they would imply unrealistically dim 
or bright galaxies at a given reds hift). Similar constraints 
are adopted by other authors, e.g.. iRowan-RobinsonI (l2003l) 
adopt the range -22.5 < Mb < -13. 

(viii) Calculate the errors on the ML best fit redshift 
estimates using constant boundaries Xmin + ^'X^ ^ (two- 
parameter) confidence limits. For Gaussian-distributed er- 
rors A„, the values Ax^=2.3 and Ax^=6.17 correspond to 
l-cr and 2-a confidence limits, respectively. This means that 
the probability that the "true" value pair (2"^"'=, t*'"'') falls 
in an elliptical region which extends within [2* — A2:, z* + A2:] 
when projected to the z axis, and within [t* — At, t* -\- At] 
when projected to the t axis, is 68.3% and 95.4%, respec- 
tively. 

The zebra's ML module computes the full likelihood 
functions in the two-dimensional redshift-template space, 
which are then used as input for the ZEBRA BY estimates. 

3.4 The ZEBRA two-dimensional Bayesian module 

As discussed in iBenite j J2000D and iBrodwin et al.l J200d) . 

employing the Bayesian method for the determination of 
photometric redshifts enables the inclusion of prior knowl- 
edge on the statistical properties of the galaxy sample under 
study, and thus to substantially improve, statistically, the 
accuracy of the redshift estimates. 

The general idea behind Bayes theorem is that the "pos- 
terior" P(a|f), which provides the parameters 0. given the 
data f , can be determined if the "prior" P{a) and the like- 
lihood C{ol) are known. Specifically: 

P(a|f) = P(a)^. (4) 

Despite its name, the function P(a) might not be 
known a priori. 

In iBenite^ i2000l) . the Bayesian prior is calculated by 
assuming an analytic function and fixing its free parameters 
using the available galaxy catal og. The metho d is powerful: 
an application is presented in iBenited (|20o3) . In particu- 
lar, by construction, the resulting redshift distribution is 
smooth and the effects of cosmic variance are reduced. The 
chosen analytic form may however not necessarily take prop- 
erly into account the selection criteria of the galaxy catalog 
under study; furthermore, some assumptions on the relation 



amongst the different free parameters are required in order 
to constrain the fit. 

A different appr oach is described in 
IPadmanabhan et al.l (|200^. There, the true redshift 
density distribution is estimated by "deconvolving" the 
measured maximum-likelihood redshift distribution from 
the errors of the photometric redshift estimates. This 
method has the advantage of being very general; however, 
for degenerate distributions and/or a small galaxy samples, 
it may not converge to a stable solution unless an additional 
prior is introduced. 

To address these issues, iBrodwin et al.l i2006l) propose 
an iterative method to build the prior self-consistently, using 
as a start the input photometric catalogue; in the redshift 
domain, this method has the advantage of closely matching 
specific over- and under-densities in the redshift distribution 
of the target field which are due to cosmic variance. These 
authors present extensive tests, performed on the galaxy 
data and using Monte Carlo simulations, to show that the 
method converges to a stable prior. 

ZEBRA adopts the same self-consistent technique used 
by Brodwin ct al. (2006) to derive Bayesian estimates for 
galaxy photometric redshifts, and furthermore extends that 
formulation by applying the Bayesian analysis to the full 
two-dimensional space of redshift and template. The equa- 
tion for the prior is therefore re-written as: 

P^^^^^^P^^^ ^J;^^^^^^^ . (5) 

Naturally, the so-constructed prior will depend on the 
selection criteria for the input sample. This dependence is 
carried over into the posterior P{z^t\i°^^), which therefore 
represents the probability density of determining the correct 
z and t, given the observed fiux densities f°''° and the se- 
lection criterion for the sample. Also, note that the values 
z* and t* of the maximum likelihood solution, and the val- 
ues 0* and which maximize the posterior probability, are 
generally different, as the latter are weighed by the prior. 

The prior is determined by starting with an user- 
specified guess-prior Paid{z,t) (e.g. a fiat prior; as long as 
the initial guess is smooth enough, the iterative prior cal- 
culation converges quickly to a unique answer; see Section 
13. 4. H . and calculating an improved prior Pncw as: 

P„ew(^,t) = PoM{z,t)—Y,^^^^^^^ pJiz',t')Mz',t') 

(6) 

Equation (|SJ follows from equation (|KJ by assuming that the 
sample is large enough to be representative, i.e. 

-^^P(^,t|ff=)«P(^,t), (7) 

i—l 

with Ng the number of galaxies in the sample. By constrain- 
ing the prior to remain smooth at each iterative step (by con- 
volution with a Gaussian kernel; see below), a small number 
of iterations, performed by resetting, after each iteration, 
Poid{z,t) <— Pncv{z,t), are found to converge to a stable 
result for the final prior P{z,t). 

In practice, it is clearly advisable to exclude unreliable 
redshift determinations in the calculation of equation (jSJ; 
these can be contributed by galaxies with poor template fits 
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and by galaxies with too sparsely sampled SEDs (i.e., with 
photometric data in only a small number of passbands). In 
our application of ZEBRA to the COSMOS data (Section 0, 
we define as "good fits" those with values of smaller than 
the threshold Xo.99i this threshold is defined by the condition 
that, assuming that the values follow a distribution 
with N filter — 3 degrees of freedom, the probability of having 
a value of x^ smaller than Xo.99 is 99%. For example, for our 
special application to the COSMOS data with photometry in 
eight filters (i.e., for five degrees of freedom), the threshold is 
given by Xo.99 ~ 15- We have tested, using as thresholds also 
some specified percentiles of the measured x^ distribution, 
that the final result is rather insensitive to the choice of the 
threshold. 

3.4-1 Smoothing of the prior 

In principle, the probability density distribution of finding 
a galaxy at a given redshift should be a smooth function of 
z. In practice, however, N{z) is estimated from the galaxy 
survey under study. The biased sampling of the large-scale 
structure, due to the finite area covered by the specific sur- 
vey, and the shot-noise, due to the finite number of galaxies 
in the survey, generate high-frequency fluctuations in the 
observed redshift distribution. The presence of sharp fea- 
tures in the estimated number counts leads to a runaway 
effect in the iterative procedure to determine the best prior. 
For galaxies whose likelihood peaks close to the redshift of 
these features, the redshift estimation is fully driven by the 
prior. Therefore, peaks in the number counts become more 
and more prominent after every iteration at the expenses of 
the surrounding regions. The net effect is that, after a few 
iterations, the prior becomes very spiky. 

This instability needs to be eliminated for a proper 
Bayesian estimation of galaxy redshifts. This can be done 
by building on the key ideas for introducing a prior, which 
are (a) to account for the fact that all redshifts are not 
equally likely, and (fe) to help to distinguish between degen- 
erate peaks of the likelihood functions. Therefore, the prior 
should not contain features that are narrower than the char- 
acteristic width of the peaks in the likelihood functions. 

A simple way to solve the problem is to smooth the prior 
after each iteration^. The smoothing scale must be chosen 
by comparing a number of characteristic scales: 

(a) The intrinsic broadness (in redshift space) of the 
features originated by large-scale structures, ctlss; 

(fe) The standard error of the Maximum Likelihood es- 
timator, (tml; 

(c) The typical broadness of the likelihood functions, 
ac (which, when photometric errors are properly estimated, 
has to be comparable with (Jml); and 

(d) The characteristic scale of the oscillations due to 
finite Poisson sampling, crp (basically the maximum redshift 
difference between two Maximum Likelihood estimates with 
consecutive redshifts). 

We have studied the effect of these different sources of 
error by performing a series of Monte Carlo simulations. In 
brief, we first Poisson-sample a given redshift distribution 

^ Equivalently, one can sm ooth the likelihood funct ions as in 
iFernandez-Soto et al.l <2002l) and lBrodwin et al.l J200dl . 
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Figure 3. The two-dimensional probability distribution in tem- 
plate and redshift space for one of the COSMOS galaxies in the 
sample that we discuss in Section |1] For illustration purposes, 
the corrected templates used for this run of ZEBRA are collapsed 
in the figure onto the corresponding 31 "uncorrected" (original 
plus log-interpolated) templates. The values of z* and t* of the 
Maximum Likelihood solution, and the values 2* and t# which 
maximize the posterior probability, are labeled in the figure. 
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Figure 4. The transmission curves for the eight filters used to 
derive the COSMOS photo-z's discussed in this paper. The orig- 
inal filters are shown as dotted lines. The adopted filter shapes 
are shown as solid lines. We removed "holes" in the transmission 
curves and smoothed them using a top-hat kernel with a FWHM 
of 200 A. Top panel: the COSMOS fihers n*, S, g', V, r', i' and 
z' . Bottom panel: The original and adopted Kg filter shapes. 

with - and without - sharp features generated by large-scale 
structures, and then apply our iterative procedure assum- 
ing Gaussian-shaped likelihoods. Convergence to a smooth 
prior is always achieved, in a few iterations, by smooth- 
ing the number counts with a Gaussian kernel of width 
a = max(CTML, o"p). Note that, at low-redshifts, where 
both (Jml and ac are small, and for large samples, where 
also (Tp is small, the prior might be affected by the pres- 
ence of large-scale structures. Basically all features such that 
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o"LSS > o are broad enough to be robustly detected and are 
present in the final prior distribution. This enhances the 
probability of measuring redshifts close to e.g., the location 
of large overdensities, and leads to an optimal estimation of 
photometric redshift in a galaxy survey. 

In ZEBRA we have thus implemented a routine to smooth 
the prior, at each step of the iterative procedure described 
above, by convolution with a Gaussian kernel with a user- 
specified sigma. 

3.4-2 The two-dimensional probability distribution in 
redshift and template space 

As an example, in Figure |21 we show the two-dimensional 
probability distribution in template and redshift space for 
one of the COSMOS galaxies in the sample that we discuss 
in Section|l] The values of z* and t* of the Maximum Likeli- 
hood solution, and the values and which maximize the 
posterior probability, are indicated in the figure. The distri- 
bution shows multiple peaks, and is dramatically different 
from e.g. the Gaussian shape that would be typically asso- 
ciated with a ML photometric redshift estimate. The key 
strength of the Bayesian analysis is indeed to provide, for 
each galaxy in a sample, such detailed information, as this 
is crucial to almost all statistical analyses of the evolution 
of galaxy properties with redshift. 



4 THE 1*' APPLICATION OF ZEBRA: 

Z-COSMOS-TRAINED REDSHIFTS FOR 
COSMOS 

4.1 The data, the sample and the input templates 

A detailed comparison of zebra's photo-z estimates with 
those obtained with other codes is presented in Mobasher 
et al. (2006). Here we limit the demonstration of the perfor- 
mance of ZEBRA by using a sample of 866 z < 1.3, Iab ^ 22.5 
COSMOS galaxies with currently available accurate (i.e., 
"confidence class" 3 and 4) spectroscopic redshifts from 
zCOSMOS (the E SO VLT spectros copic redshift survey of 
the COSMOS field: iLillv et all200 6ll. A further test on mock 
galaxies is presented in Appendix lO 

These 866 galaxies with zCOSMOS spectroscopic red- 
shifts belong to the complete sample of about 55000 Iab ^ 
24 COSMOS galaxies discussed in IScarlata et"ai] J200d) : 
we use this complete sample to construct the initial guess- 
prior in the Bayesian calculation of the photometric red- 
shifts for the COSMOS galaxies. The allowed range for the 
galaxy absolute B magnitudes was conservatively fixed to 
be -24 < Mb < -13. 

Exploiting the wealth of ancillary data that are avail- 
able for the entire COSMOS field, we use, as input photo- 
metric catalogues, Subaru B, V, g' , r' , i' and z' photometry 
(5ff magnitude limit of ~27 for point sources in all bands; 
iTaniguchi et al .l2006h ; CFHT u* photometry (5o" magnitude 
limits for point sources of u* — 27.4); and Ks photometry 
from the NOAO wide-field IR imager Flamingos (Kitt Peak 
4m telescope) and the Cerro Tololo ISPI (Blanco 4m tele- 
scope; all da ta are collected in the catalogue presented by 
ICaoak et alJ l2006l. About 97.2% of our spectroscopic sam- 
ple has photometry available in all eight passbands. The 
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Figure 5. ML photometric redshifts for the COSMOS sam- 
ple un der st udy, derived f r om th e uncorrected IColeman et alj 
il980l) and iKinnev et alJ ll996l) templates plus their log- 
interpolated templates. In the upper panel, the redshift estimates 
•2phot,ML,NTC 3-1'^ plotted against the zCOSMOS Zspec spectro- 
scopic redshifts. Each symbol in the plot corresponds to an in- 
dividual galaxy. Empty symbols indicate a "bad" fit, defined as 
a fit with a reduced >3. The lower panel shows the depen- 
dence of the accuracy of the photometric estimates, as quantified 
by Az/{1 + z) (with Az = 2phot,ML,NTC - 2spoc), as a function 
of Zspec Colors represent different templates: elliptical (dotted 
red), Sbc (short-dashed orange), Scd (long-dashed green) and ir- 
regular (dot-dashed blue) types. The total residual, independent 
of template type, is shown by a solid black line. Only templates 
which contain at least five objects in the respective redshift bin 
are shown. Interpolated template types are rounded to their near- 
est basic template type and plotted with the corresponding color. 
The short-dashed Az/{1 + z) ± 0.03 lines correspond roughly to 
1-cr error bars and are shown to guide the eye. 

relevant filter transmission curves are shown in Figure ^ 
before and after the correction for sharp features at specific 
wavelengths. Systematic calibration errors in each passband 
were estimated and corrected using the photometry-check 
module of ZEBRA. These were typically very small, constant 
shifts. The robustness of the corrections was tested by deriv- 
ing them also after fixing the redshift to the spectroscopic 
value in the galaxy-template fits. 

The adopted "original " set of t empl ates consists of the 
six templates described in iBem'ted llioobll (which are avail- 
able on the BPZ website). T hese are based on six o bserved 
galaxy spectra, i.e., the four of lColeman et al.llll98(]ll .i.e.. an 
elliptical, a Sbc, a Scd and an irregul ar type template, an d 
the two starbursting galaxy spectra of | Kinnev et alj jl99€|). 
ISawicki et alJ il997l ). iBem'tez et all (Il999l) and T^ihat^^il] 
li200ofr "discuss and demonstrate the improvements in the 
quality of the redshift estimates that are obtained by aug- 
menting the set of templates to include the starbursting 
types. As discussed by these previous authors, these ob- 
served templates are extended into the UV by means of a 
linear extrapolation up to the Lyman-Break, and into the 
IR (up to ~ 25000A) using GISSEL synthetic templates. We 
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Figure 6. Error-normalized flux residuals (/° / — fn,i)/ ^n,i ver- 
sus rest-frame wavelength for 269 galaxies with spectroscopic red- 
shifts in the range 0.2 z < 0.4. The different panels corre- 
spond to different templates: elliptical ("EH") types, Sbc types, 
Scd types and irregular ("Im") template types. Top: Residuals 
before the ZEBRA 's automatic template correction is applied. Bot- 
tom: Residuals after ZEBRA's template correction. The systematic 
trends and scatter are substantially reduced. 



furthermore performed a 5-step log-interpolation to sample 
more densely the SED space covered by the original tem- 
plates. This results in a basic set of 31 input ("uncorrected") 
templates. 



performed (2:phot,ML,NTc), and the zCOSMOS spectroscopic 
redshifts (zspec). Figure 13 presents this comparison. 

In the figure, the bottom panel shows the deviation 



Az/(1 -|- z) versus 



(with Az = 2:phot 



color-coded for the different templates types. Although over 
the entire ^5 Zapcc < 1-3 range the overall redshift estimate 



is acceptable (a 5 — cr-clipped cta 



0.043 with 19 



clipped galaxies), the individual templates show large sys- 
tematic deviations. The elliptical and Sbc templates in par- 
ticular show a significant systematic under- and overestima- 
tion of the redshifts, respectively. Furthermore, no available 
"uncorrected" (i.e., original plus log- interpolated) template 
appears to be adequate to reproduce the SEDs of z > 0.8 
galaxies: these high redshifts are systematically underesti- 
mated when using the available z—Q galaxy templates. While 
it remains to establish whether this systematic failure at 
z > 0.8 is due to the uncertainties in the templates or to 
astrophysical reasons (e.g., much stronger emission lines at 
high redshifts than at z = 0, or a young, passively evolving 
elliptical galaxy population; etc) , it is clear that this system- 
atic effect would have a substantial impact on the reliability 
of statistical studies of galaxy evolution with redshift. 

The template correction substantially improves the pho- 
tometric redshift estimates, and in particular cures the most 
troublesome systematic failures of the estimates derived 
without template correction. As an example, for galaxies 
with redshifts in the range 0.2 ^ z < 0.4, Figure |S] shows 
the residuals A/ between observed flux density and best- 
fit template flux density, as a function of rest-frame wave- 
length, before and after template corrections (using a = 2 
and p = 0.05*). The substantial improvement in the red- 
shift estimates is observable in Figure [7| which shows the 
same comparison with the zCOSMOS spectroscopic red- 
shifts as above, but this time for the ZEBRA photometric red- 
shift estimates with template correction {z = Zphot,ML,Tc)- 
The template corrections were optimized in the redshift bins 
2=0-0.2; 0.2-0.4; 0.4-0.6; 0.6-0.8; 0.8-1.0; 1.0-1.3 and 0-0.3; 
0.3-0.5; 0.5-0.7; 0.7-0.9; 0.9-1.3^^). Note that the Zphot,ML,TC 
redshifts at and above ~ 0.8 lie now well within the sta- 
tistical errors. The global accuracy of the ZEBRA ML red- 
shift estimates Zphot.ML.TC is now reduced to a 5-cr-clipped 
caz/(i+z) = 0.027 (with a clipping of only 10 galaxies). 

Similar results are found when comparing the ZEBRA BY 
redshifts with the zCOSMOS spectroscopic redshifts. In Fig- 
ure |H] we show the results of the iterative calculation of the 
prior using the > 56000 galaxie s in the entire ACS-se lected 
Iab ^ 24 COSMOS sample of IScarlata et alj J2006l) from 
which our spectroscopic sample was extracted. The prior 
was obtained using an adaptive Gaussian smoothing kernel 
of F = 0.05(1 -I- z), which was tested to lead to a stable prior 
estimate. In the figure, the left panel shows the prior esti- 
mate, marginalized to redshift space, after one (dotted lines) 
and five (solid lines) iterations. Although we only present the 
prior marginalized over template types, the full 2D-prior is 
being used for the subsequent calculation of the posterior. 



4.2 Results 

To illustrate the importance of the ZEBRA template correc- 
tion, we first present the comparison between the ML pho- 
tometric redshifts derived when no template correction is 



* A large volume of a-p parameter space was explored. Tests show 
that the ZEBRA solutions are quite stable and do not depend on 
small variations of these parameters. 

^ The choice of overlapping redshift bins was made to avoid spuri- 
ous "boundary" effects in the derivation of the redshift-optimized 
templates. 
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Figure 7. ZEBRA's ML redshift estimates for the spectroscopic 
sample derived after correcting the templates as described in Sec- 
tions l3.^ andlll The displayed quantities are the same as in Fig. 151 
The systematic trend that is visible in Fig.|5] i.e. the underesti- 
mation of the redshifts for z ^ 0.8, is here eliminated by the use 
of adequately corrected templates. 

The right panel shows the ratio 77(2;) of the marginalized 
priors P{z), from two successive iterations; the dotted lines 
correspond to ratio of the priors after the second and first 
iteration; the solid lines show the prior ratio between the 
fifth and fourth iterations. 

In Figure 1^1 we present the ZEBRA-zCOSMOS compari- 
son as above, this time for the ZEBRA BY redshift estimates 
derived with template correction (zphot,BY,Tc). These ZEBRA 
BY redshifts are obtained using the values 2;* and which 
maximize the posterior. The figure highlights a similar high 
quality for the ML and BY ZEBRA estimates wtth template 
correction; indeed, the differences between the template- 
corrected BY and ML redshift estimates are vanishingly 
small. Of course, in BY mode ZEBRA returns the redshift 
and template probability distribution for each galaxy. The 
BY run gives a 5-cr clipped accuracy of crAz/(i-i-z) = 0.027 
with only 7 outliers clipped, comparable to the one derived 
for the ML estimates. 



5 CONCLUDING REMARKS 

A more thorough comparison of the ZEBRA ML and BY 
photo-z's with the cur r ently available zCOSMOS redshifts 
is given bv iLillv et"al] J200d) . Furthermore, the ZEBRA ML 
and BY photometric redshift estimates are compared by 
iMobasher et al.l ll200d) with photo-« estimates derived for 
the same galaxies with independent codes (these are either 
public codes, e.g. BPZ or have been developed by other 
teams within the COSMOS collaboration). 

The ZEBRA ML photometric redshifts estimates for the 
COSMOS sample studied in this paper have already been 
used to derive the evolution up to 2 ~ 1 of the lumi- 
nosity functions for morphologically-classified early-, disk- 
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Figure 8. The marginalized prior derived from the COSMOS 
sample of Scar lata et al. (2006), from which our spectroscopic 
sample is extracted, and its convergence properties. Upper panel: 
The marginalized prior after the first iteration (dotted line) and 
after five iterations (solid line) for a Gaussian smoothing length 
r = 0.05(1-1-2) in redshift space. Lower panel: The point-by-point 
ratio of two successive redshift-marginalized prior estimations. 
The dotted line shows the ratio of the prior estimates between 
the second and first iteration. The solid line shows the prior ratio 
between the fifth and fourth iteration. 
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Figure 9. The application of ZEBRA's two-dimensional Bayesian 
method (i) after smoothing of the projected P{z) prior with a 
smoothing scale of F = 0.05(1 -I- z) and (ii) using the same tem- 
plate correction as in Fig.[3 Symbols are as in Figure 151 (except 
that no threshold is shown). Similar to the ZEBRA ML estimates 
with template correction, also the ZEBRA BY photo-z's with tem- 
plate correction eliminate the systematic trends that are present 
in the redshift estimates without template correction. 
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and irregular-type galaxies (according to the classification 
scheme of ZEST, th e Zurich Estimator of Structural Types; 
IScarlata et al.ll200(t l. to study the evolution up to similar 
redshifts of the nu mber density of int ermediate-size and 
large disk galaxies JSareent et al.ll200d) and to study the 
evolution of the luminosity fun ction of elliptical galaxy pro- 
genitors dScarlata et alJl2006a^ . 

The current version (1.0) of ZEBRA is being packaged 
with a user-friendly web-interface at the URL provided in 
the abstract. The ZEBRA website will be constantly updated 
to provide the newest improved versions of the ZEBRA code, 
and the associated documentation describing in detail the 
implemented changes. In the meanwhile, ZEBRA is being up- 
graded with several new modules, including (1) A module 
that incorporates dust absorption and reddening, according 
to several user-specifiable extinction corrections; (2) An im- 
proved treatment of AGN's; and (3) A module that uses 
several synthetic template models and a large choice of self- 
consistent star formation and metal enrichment schemes to 
estimate stellar masses, average ages and metallicities (and 
their uncertainties). 
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APPENDIX A: NOTATION AND DEFINITIONS 

The filter-averaged spectral fiux density of a template can 
be decomposed into a template-based spectral flux density 
fz,t,n and a template normalization a by setting: 



0'fz,t,n — 



(Al) 



The spectral flux density s^^t measured in the observer 
frame is related to the rest-frame template shape Su,t by; 

stnWil + ^)) = (1 + = (1 + ^)s.,t{'^)at. (A2) 

The normalization factor at matches the template shape 
Sv,t{y) with the apparent spectral flux density in the rest 
frame of a point-source with luminosity (v) at a luminos- 
ity distance Dl- The relation a = at/(c(l -I- z)) between a 
and at can then be derived from the above definitions. 

For high redshift sources, attenuation effects of inter- 
vening intergalactic material, especially neutral hydrogen. 
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become increasingly important. These attenuation effects 
are mostly contributed by Lyman series line-blanking and 
photoelectric absorption®. These effects are accounted for 
by including a factor of e~'"'^'^' in the expression for fz,t,n, 
i.e., defining: 



/dAAsA,t(A/(l + z))e-"(^'^)n(A) 



(A3) 



/dA/An(A) 

The ZEBRA user can choose to adopt either the lMadaul il995l) 
or the Meiksin (2006) calculation for the attenuation term; 
compared with the former, the latter provides a somewhat 
lower absorption strength at any redshift. The K-correction 
Kni between filter band I in the the rest-frame and filter 
band n in the observed-frame is defined as: 



K„,i — m„ 



-51ogio(OL/10pc) 



with Ml the absolute magnitude in the filter I, and m„ the 
apparent magnitude in the filter band n. The iiT-correction 
can be written as: 

Kni{z,t) = mag(a/3,t,„) - mag(/tT) 

= mag(a/2,t,„) - mag(a(l + z)fo,t.i) 

= 2.51ogi(,(l + z) + mag(/^,t,„) - mag(/o,t,i)- 

where ft^ is the restframe spectral flux density in the filter 
band n and mag(a::) — — 2.51ogjQ(j:). ZEBRA can provide the 
JT-corrections for all (original, interpolated and corrected) 
used templates. 

The normalized likelihood C{z,t,a) can be written as: 



C{z, t, a) = 



P{f'"''\z,t,a) 



J2t, J dz' J da'P{f°^<'\z',t',a') 



e 2 



E,,/d^'/da'e-2' 

with P{f\a) as the conditional probability distribution of 
reproducing the data f given the param eters a. 

The x'^(-2,i,i) can be expressed as jBem'te jl200(J) : 



X (z,t,a) = Foo 



where 



Fqt 
Ftt 

-Y. 

n = l 
Ng 

-Y. 

n = l 
Ng 

FOT = ^ 



+ 



For 
Ftt 



Ftt 



(A4) 



Foe 



Ft 



fz,t,n 
An 

j-obs J" 
J n J z, 



In this formulation, the best fitting template normal- 
ization a* is given by a* = Fqt /Ftt- The best fitting red- 
shift z* and template type t* follow from the maximum of 
Fqt /Ftt- The largest likelihood corresponds to the mini- 



mum Xn 



Foo-{FST/FTT)iz*,r)- 



^ The component contributed by photoelectric ab sorption is est i- 
mated by the approximation given in footnote 3 of lMadauHl99El) . 



APPENDIX B: THE ZEBRA x^-MINIMIZATION 
APPROACH TO TEMPLATE CORRECTION 

We first describe the simple case when only the original set 
of templates is used as input, without interpolations between 
the original templates. We indicate with Nt the number of 
ca talog entries which ar e best fitted by a template type t- 
In lBudavari et all i2000l) the spectral distribution s°"^(fc) of 
the original template type t is changed by a x^-minimization 
over all template shapes St°''(k), iteratively for all entries i £ 
Nt- Specifically, Budavari et al. (2000) perform the template 
correction by minimizing the following x^ function: 



2 

Xt,i 



1 1 

^ -^(sr (fe) - sTHk)f + ^ -^{f^- ~ /° 



A2 



In our approach, the shape s°'^'^(t) of a given basic tem- 
plate t is changed in one step, taking all entries i £ Nt into 
account at once; furthermore, a regularization term is in- 
cluded in the definition of x^, to avoid unphysical high fre- 
quency fiuctuation in the correction of the template as a 
function of wavelength. We therefore determine the optimal 
template corrections by minimizing: 



1=1 

Nt Nb 



k '.fc 



robs-, 2 



+ /V, S 5Z A2 -^^ 

* j=l n = l 

+ E ^(«r(fc + 1) - sr{k) - sf^ik + 1) + sf^{k)f 
k p't'k 

(Bl) 

with the variables as described in Section [3.21 
The spectral fiux density f^J of the corrected template 
in the filter band n depends on the catalog entry i through 
its best fitted template type t, redshift z and normalization 
factor a. Specifically: 



fn,i — ^ ^ F-n (k)St (fc). 



(B2) 



where T^{k)st°' (k) has yet to be determined. 

In the Maximum Likelihood procedure (Section 13.311 . 
the template-based spectral fiux density ft,z,n is calculated 
for each template t, filter n and redshift z, modulo an over- 
all normalization constant a. The procedure assigns to each 
entry i a triple {t{i), z{i), a{i)) so that the x^ is minimized. 

ZEBRA uses a linear approximation to describe the spec- 
tral flux density through the best flt template shape, i.e.: 

fn,i = a{i)fz(i),t{i),n ~ '^F^{k)st^^){k), with 
fe 

Tn(fc) = ^j^x/An(Af ^^^^ ""^^'^^ + '^^^^ 

The effect of intergalactic absorption is included easily 
by extending the definition of T^{k) using IIA3l l: 

= (V+^(y'f;) AAA,n(A,(l+.(z)))e-'^^<^+-<'»'-W) 
J dA/An(A) 

The two-step iterative template correction then pro- 
ceeds as described in Section f3. 21 
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When log-interpolated templates are used we define the 
set A/t as the set of catalog entries i, so that t is the nearest 
basic type of the best fitting type t{i). If t{i) is a basic tem- 
plate, then t = t{i); if t{i) is an interpolated template, then 
t = ti ii g < 0.5, or otherwise t = t2, see To simplify 
the notation we define: 



Xt 



iVt Nb 



+ 7^EE A^(»"-E^"«^*«)' 



(B5) 



loe; 



,t,9 



if < 0.5 
if ^ 0.5. 



Here and t~ indicate the successor and predecessor basic 
template of the basic template type t with respect to the 
(assumed) global ordering. 

When using interpolated templates, equation lB2t has 
to be re-defined. In particular, a change in the shape s™'^(A:) 
leading to sf^ (k) is reflected in a changed spectral flux den- 
sity f^°l for each entry i £ J\ft- For g[i) < 0.5, we obtain: 



k 
k 

Using: 



Postulating g^jfjXt = leads to a system of linear 
equations in ^t(fe), i.e.: 

k 

where 



Mt{l,k) 



^ Nt Nb 



(Tt.k ' iVt ^ ^ A„ , 



-I- 



Pt,k-1 



l.k ^ OLk~l) 



Pt,k 



l,k — 0!,fc+l J 



ft 



. Nt Nb 



iVj ^ ^ A2 



(B6) 
(B7) 



(sr^(fc)) 



1 + 



6(fc) 

sf^(fc) 



I-S(i) 



and assuming that ^t(fc) = St°'^{k) — s°"^(fe) is small in 
comparison with s°"^(fc), the following approximation holds: 



k V 

(B3) 



Similarly, for g{i) ^ 0.5, we obtain: 

f^- ^ ^TUk)is,-{k))'-^^'\sf'{k)r^'^ fl+g^^)J^\ 

k V Sj (k)J 

(B4) 

In this approximation the spectral flux density depends 
linearly on ^t(fc) and respectively. With the defini- 

tion s'°Q = s'°f = St°'^(fc), the equations ljB3^ and l|B4|l also 
describe the change in spectral flux density if the best fit 
template is an original template. 

To minimize the , the templates are sampled on a grid 
linearly spaced in units of log(A); all templates are normal- 
ized to the spectral flux density of unity in the B-band, in 
order to be able to use for each template the same pliantness 
a. With the definitions; 



The density of the A-grid used to sample the templates 
determines the size of the set of linear equations. In the 
application to the COSMOS sample described in Section |11 
we have used a grid in log(A)-space of about 800 points. 

Attention has to be paid in carefully choosing the free 
parameters, in order to obtain physically meaningful cor- 
rections to the templates when using also interpolated tem- 
plates. Specifically, if the absolute change |^t(A:)| of a tem- 
plate is larger than the value s°''^(fc) of the original tem- 
jk) p |ate at that wavelength, the approximative treatment of 
^(fc)t/ie log- interpolated templates becomes inappropriate. This 
can happen if a too high pliantness at,k is used, and/or if 
too few galaxies are available to constrain the fits that are 
performed to correct the templates. If a corrections would 
make the flux of a template negative in some wavelength 
region, the flux is set to zero. If that happens, the c^A;) 
coefficient is also set to zero, thereby inhibiting any further 
change in that template at that specific wavelength. 



APPENDIX C: 
SAMPLE 



TESTING ZEBRA ON A MOCK 



gn = 



Cnik) 



' & ~j:,mk){s,+{k)y^'\sf^{k)y-^^^^ it 
Xr^j:,n{k)is,-ik)y-^'^^\sf'{k)y^^^ if 



n{k){s,-ik)) 



1-9(0 /•t,°"g/ 



if<;(i)^ 



equation HBll l can be written as: 



We further demonstrate the performance of ZEBRA using a 
mock catalog that has been produced for the COSMOS field. 
Simulations of galaxies rely on population synthesis and dust 
models which may not perfectly match the observed SEDs 
of real galaxies. We find indeed that the use of the galaxy 
templates discussed in Section [4.11 provides slightly less ac- 
curate photometric redshift estimates for the mock galaxies 
< OtBan for real data. On the other hand, adopting the same 
^ grigodels that were used to construct the mock galaxies when 
recovering their photo-z's results in unrealistically accurate 
0-5 results. Testing the code on a mock catalog has however sev- 
0.5eral advantages, as the mock catalog provides a large set of 
data with known precise redshifts, and hence allows us to 
test the reliability and stability of the code using disjoint 
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samples for the training set (that is used for the template 
correction) and for the assessment of the photo-z accuracy. 

The mock catalog used for our tests contains about 
50000 galaxies with / ^ 22.5 and data in five photometric 
bands {B, g, i, r, Ks). We used the same templates discussed 
in Section UTTI In order to directly compare the results ob- 
tained with the mock data with those obtained using the 
zCOSMOS spectroscopic redshifts, we limited the training 
set to 1000 mock galaxies, and we used a sample of 10000 
mock galaxies, disjoint from the training set, to perform the 
tests. 

A run of ZEBRA in photometry-check mode on the orig- 
inal mock data showed that systematic photometric offsets 
were smaller than the assumed relative photometric error 
of 0.05 magnitudes. To test the effect of systematic photo- 
metric offsets, we therefore added shifts up to 0.2 mag to 
the mock data. These offsets were correctly identified and 
removed by the ZEBRA's photometry-check mode. 

ZEBRA was then run in the template-optimization mode; 
this was done using, for the training set, photometric data 
both corrected and not corrected for the added "extra" off- 
sets discussed above, so to establish the impact of systematic 
photometric errors on the template correction procedure. 
The entire set of original plus corrected templates was then 
used in the analysis. 

In Table lUTI we summarize the results of applying the 
Maximum- Likelihood mode of ZEBRA both to recover the red- 
shifts of the training set galaxies themselves, and to estimate 
the redshifts of the independent set of 10000 galaxies in the 
"evaluation" catalog. 

Four configurations were explored, i.e., using: (a) Cata- 
log not corrected for photometric offsets and original ("un- 
corrected") templates; (&) Catalog corrected for system- 
atic photometric errors and again original, uncorrected tem- 
plates; (c) Catalog not corrected for photometric offsets and 
corrected/optimized templates; (d) Catalog corrected for 
systematic photometric errors and corrected/optimized tem- 
plates. In Figure IcTI we compare the resulting photometric 
redshifts for the 10000 galaxy "evaluation sample". 

These tests indicate that: 

(i) The accuracies of the photometric redshifts obtained 
when applying ZEBRA to the galaxies of the training sam- 
ple itself and to the disjoint evaluation sample are nearly 
identical (see Table IClfl . This shows that results of the 
photometry-check mode and template-optimization mode are 
robust and lead to a high accuracy in the redshift estimates; 

(ii) Systematic photometric errors may indeed lead to 
substantial systematic artefacts in the photometric redshift 
estimates, which need to be removed before the template 
correction is performed; 

(iii) Accurate redshifts without significant systematic 
artefacts can only be achieved if both photometric correc- 
tions and template corrections are employed. 

This paper has been typeset from a TjjX/ I^TJ^X file prepared 
by the author. 



Catalog Phot. Tempi. a Az/{l + z) % 
corr. optim. 



Training 


no 


no 


0.1008 


-0.051 


1.3 


Training 


yes 


no 


0.0526 


-0.001 


2.7 


Training 


no 


yes 


0.0785 


-0.029 


0.7 


Training 


yes 


yes 


0.0345 


0.000 


1.0 


Evaluation 


no 


no 


0.1004 


-0.050 


1.2 


Evaluation 


yes 


no 


0.0590 


-0.001 


2.4 


Evaluation 


no 


yes 


0.0780 


-0.029 


0.5 


Evaluation 


yes 


yes 


0.0350 


-0.001 


1.4 



Table CI. Results of the application of ZEBRA in Maximum- 
Likelihood mode to 1000 mock galaxies that are also used as train- 
ing set ("Training" catalog), and of the application of the code to 
a sample of 10000 mock galaxies ( "Evaluation" catalog) not over- 
lapping with the "Training" catalog. The second column indicates 
whether the photometric catalogs are corrected for systematic er- 
rors; the third column indicates whether the template-correction 
scheme has been applied. Columns four and five list the accuracy 
'^i\z/(l+z) ^ud the mean ofi'sot Az/{1 + z) of the photometric 
redshift when compared with the "true" redshifts after 5-(t clip- 
ping. The percentage of 5-cr outliers is listed in the last column. 
Note the high accuracy and lack of global shift that is obtained 
when both the corrections to the photometric catalogs and the 
template optimization are applied; also, accuracies of the same 
order are obtained in the "Training" and "Evaluation" runs. 
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Figure CI. The comparison between the ZEBRA photometric red- 
shifts and the "true" redshifts of the mock galaxies. The figure 
refers to the "evaluation" runs in which 1000 galaxies are used as 
the training set, and the evaluation of the performance is made 
on a non-overlapping sample of 10000 mock galaxies. Four differ- 
ent cases are shown: (a) The catalogs contains substantial pho- 
tometric offsets, and the templates are not optimized; (b) The 
photometry correction scheme is now applied, but no template 
optimization has yet been performed; (c) No photometric cor- 
rection is performed, but the template optimization scheme has 
been applied; (d) Photometric errors are removed from both the 
evaluation sample and the training sample, and the template op- 
timization scheme is applied. 



